which proxies to train against.
https://www.lesswrong.com/posts/G9HdpyREaCbFJjKu5/it-is-reasonable-to-research-how-to-use-model-internals-in?commentId=krg2jzDxXhei9vNLjand Daniel Kokotajlo comment about preserving at least one output stream that isn’t optimised against(this could be activations, while doing cot+output monitoring)
https://www.lesswrong.com/posts/G9HdpyREaCbFJjKu5/it-is-reasonable-to-research-how-to-use-model-internals-in?commentId=krg2jzDxXhei9vNLj
and Daniel Kokotajlo comment about preserving at least one output stream that isn’t optimised against(this could be activations, while doing cot+output monitoring)